PAC-Bayesian Bounds for Discrete Density Estimation and Co-clustering Analysis

نویسندگان

  • Yevgeny Seldin
  • Naftali Tishby
چکیده

We applied PAC-Bayesian framework to derive generalization bounds for co-clustering. The analysis yielded regularization terms that were absent in the preceding formulations of this task. The bounds suggested that co-clustering should optimize a trade-off between its empirical performance and the mutual information that the cluster variables preserve on row and column indices. Proper regularization enabled us to achieve state-of-the-art results in prediction of the missing ratings in the MovieLens collaborative filtering dataset. In addition a PAC-Bayesian bound for discrete density estimation was derived. We have shown that the PAC-Bayesian bound for classification is a special case of the PAC-Bayesian bound for discrete density estimation. We further introduced combinatorial priors to PAC-Bayesian analysis. The combinatorial priors are more appropriate for discrete domains, as opposed to Gaussian priors, the latter of which are suitable for continuous domains. It was shown that combinatorial priors lead to regularization terms in the form of mutual information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering

We derive a PAC-Bayesian generalization bound for density estimation. Similar to the PAC-Bayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KL-divergence of the posterior from the prior. Moreover, the PACBayesian generalization bound for classification can be derived as a special case of the bound for ...

متن کامل

PAC-Bayesian Analysis of Co-clustering and Beyond

We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discr...

متن کامل

A PAC-Bayesian Analysis of Co-clustering, Graph Clustering, and Pairwise Clustering

We review briefly the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008, 2009, 2010), which provided generalization guarantees and regularization terms absent in the preceding formulations of this problem and achieved state-ofthe-art prediction results in MovieLens collaborative filtering task. Inspired by this analysis we formulate weighted graph clustering1 as a prediction probl...

متن کامل

PAC-Bayesian Analysis of Co-clustering with Extensions to Matrix Tri-factorization, Graph Clustering, Pairwise Clustering, and Graphical Models

This paper promotes a novel point of view on unsupervised learning. We argue that the goal of unsupervised learning is to facilitate a solution of some higher level task, and that it should be evaluated in terms of its contribution to the solution of this task. We present an example of such an analysis for the case of co-clustering, which is a widely used approach to the analysis of data matric...

متن کامل

Comparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches

This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010